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Abstract 



Teneurins are type II transmembrane proteins expressed during pattern formation and neurogenesis with an intracellular 
domain that can be transported to the nucleus and an extracellular domain that can be shed into the extracellular milieu. 
In Drosophila melanogaster, Caenorhabditis elegans, and mouse the knockdown or knockout of teneurin expression can 
lead to abnormal patterning, defasciculation, and abnormal pathfinding of neurites, and the disruption of basement 
membranes. Here, we have identified and analyzed teneurins from a broad range of metazoan genomes for nuclear 
localization sequences, protein interaction domains, and furin cleavage sites and have cloned and sequenced the 
intracellular domains of human and avian teneurins to analyze alternative splicing. The basic organization of teneurins is 
highly conserved in Bilateria: all teneurins have epidermal growth factor (EGF) repeats, a cysteine-rich domain, and a large 
region identical in organization to the carboxy-half of prokaryotic YD-repeat proteins. Teneurins were not found in the 
genomes of sponges, cnidarians, or placozoa, but the choanoflagellate Monosiga brevicollis has a gene encoding a predicted 
teneurin with a transmembrane domain, EGF repeats, a cysteine-rich domain, and a region homologous to YD-repeat 
proteins. Further examination revealed that most of the extracellular domain of the A/1, brevicollis teneurin is encoded on 
a single huge 6,829-bp exon and that the cysteine-rich domain is similar to sequences found in an enzyme expressed by the 
diatom Phaeodactylum tricornutum. This leads us to suggest that teneurins are complex hybrid fusion proteins that 
evolved in a choanoflagellate via horizontal gene transfer from both a prokaryotic gene and a diatom or algal gene, perhaps 
to improve the capacity of the choanoflagellate to bind to its prokaryotic prey. As choanoflagellates are considered to be 
the closest living relatives of animals, the expression of a primitive teneurin by an ancestral choanoflagellate may have 
facilitated the evolution of multicellularity and complex histogenesis in metazoa. 
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Introduction 

Teneurins are phylogenetically conserved transmembrane 
proteins (see reviews by Tucker and Chiquet-Ehrismann 
2006; Young and Leamey 2009). The name "teneurin" 
honors their discovery in Drosophila melanogaster by 
combining the names of the two dipteran teneurin homo- 
logs, Ten- a (Baumgartner and Chiquet-Ehrismann 1993) 
and Ten -m (Baumgartner et al. 1994, also referred to as 
Odz [Levine et al. 1994]), with neurons, which are com- 
mon sites of expression (e.g., Minet et al. 1999). In D. mel- 
anogaster, chicken, and mouse the teneurin homologs 
have the following conserved features: 1) teneurins are 
type II transmembrane proteins with an N-terminal intra- 
cellular domain (ICD) and a large extracellular domain 
(ECD); 2) teneurins have eight epidermal growth factor 
(EGF) repeats; 3) the third cysteine residue in the second 
and fifth EGF repeat is replaced with a tyrosine or phenyl- 
alanine residue, which results in the potential for teneur- 
ins to dimerize side-by-side through disulfide bonds 
(Oohashi et al. 1999); and 4) the C-terminal two-thirds 



of teneurins is similar to the YD-repeat proteins of pro- 
karyotes, with characteristic NHL (from NCL-1, HT2A, and 
Lin-41) repeats, tyrosine and aspartate-rich YD repeats, 
and a region similar to the core-associated domain of ret- 
rotransposon hot spot (RHS) proteins. In addition, many 
teneurins can be proteolytically processed, freeing the 
ICD for transport to the nucleus (Bagutti et al. 2003; Nunes 
et al. 2005; Kenzelmann et al. 2008) and/or releasing the 
ECD for incorporation in the extracellular matrix (ECM; 
Rubin et al. 1999; Trzebiatowska et al. 2008). An additional 
cleavage site near the C-terminus can lead to the creation 
of a neuropeptide (reviewed by Lovejoy et al. 2009). 
Proline-rich Src homology 3 (SH3)-binding domains have 
been identified in the ICD of teneurins cloned from chordates 
and ecdysozoans, and ICD-interacting partners have been 
characterized that may mediate associations between ten- 
eurins and the cytoskeleton and methylated DNA (Nunes 
et al. 2005). Mutation analysis and RNAi-mediated knock- 
down of teneurin expression in D. melanogaster and Caeno- 
rhabditis elegans reveal fundamental roles for teneurins in 
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pattern formation (Baumgartneretal. 1994; Rakovitsky etal. 

2007) ,axonalfasciculation(Drabikowskietal.2005), and the 
integrity of basement membranes (Trzebiatowska et al. 

2008) . In chordates, teneurins are best studied in mouse 
and chicken, where they are predominantly expressed in the 
developing nervous system in area-specific patterns 
mediated in part by EMX2 (Li et al. 2006; Beckmann 
et al. 2011). Knockout of the gene encoding mouse teneur- 
in-3 by homologous recombination results in abnormal path- 
finding in the visual system and a loss of binocular vision 
(Leamey et al. 2007). 

In order to identify novel features and learn more about 
the potential evolutionary origin of teneurins, we searched 
for and compared sequences encoding teneurin-like pro- 
teins in opisthokont genomes and collections of expressed 
sequence tags (ESTs). We also cloned and sequenced 
cDNAs encoding the ICD of teneurins from human and 
chicken to study alternative splicing. By aligning and ana- 
lyzing proteins for predicted nuclear localization sequences 
(NLSs), SH3-binding domains, and furin-type proteolytic 
cleavage sites, we have refined our knowledge of conserved 
teneurin structure and function. In addition, we identified 
a gene encoding a teneurin in the choanoflagellate Mono- 
siga brev'icoll'is, which suggests that teneurins may have 
played a role in the early evolution of metazoan tissues. 

Materials and Methods 

Sequence Analysis 

Novel teneurin sequences were identified by sequence ho- 
mology using tBLASTn (http://blast.ncbi.nlm.nih.gov/) and 
by domain architecture using Pfam (http://pfam.sanger.ac. 
uk/) with "view a family," SMART (http://smart.embl-heidel 
berg.de/smart/) with "architecture queries," and Superfamily 
(http://supfam.org/SUPERFAMILY/) with "domain combi- 
nations." Boundaries of regions, domains, and repeats were 
determined using Pfam for EGF repeats, NHL repeats, RHS 
repeats (related to YD repeats), RHS protein, Ten_N do- 
mains, and PfamB PB025792 (the region between the trans- 
membrane domain and the EGF repeats, which was 
identified as a phylogenetically conserved region by Pfam), 
and SMART for transmembrane domains and EGF repeats. 
Alignments and phylogenetic relationships were determined 
using ClustalW (http://www.genome.jp/tools/clustalw/) and 
the settings "pair alignment slow/accurate" (gap open penalty 
10, gap extension penalty 0.1). Importin a//? pathway NLSs 
were identified using NLS Mapper (http://nls-mapper.iab. 
keio.ac.jp), and furin cleavage sites were predicted with ProP 
(http://www.cbs.dtu.dk/services/ProP/). SH3-binding domains 
were identified by hand from consensus sequences described 
by others (Kay et al. 2000; Mayer 2001; Kowanetz et al. 2003). 

Reverse Transcriptase PCR and Sequencing 
Human adult brain cDNA was generated out of total hu- 
man brain RNA (AMS Biotechnology, Oxon, UK) using 
Superscript III (Invitrogen, Carlsbad, CA) polymerase and 
random hexamer primers (Invitrogen). Sequences corre- 
sponding to the ICDs of teneurins-1 through -4 were 



amplified with PfuTurbo polymerase (Stratagene/Agilent 
Technologies, Santa Clara, CA) using specific primers (teneur- 
in-1 : 5 ' -ACTAGCGGCCGCACCATGG AGCAAACTGACT- 
GC-3 75 ' -ACTACTCGAG GCAGCACCTGTAAGGTTTG-3 ' ; 
teneurin-2: 5 ' -ACTAGCGGCCGCACCATGGATGTAAAG- 
GACCGG-3 ' IS ' -ACTACTCGAGGCAGTATTTGGAGGGCT- 
TC-3'; teneurin-3: 5 ' -ACTAGCGGCCGCACCATGGATGTG 
AAAGAACGC-3 ' IS ' -ACTACTCGAG ACAGTACTTTGAAG- 
ACTTC-3'; teneurin-4: 5 ' -ACTAGCGGCCGCACCATGG AC 
GTG AAGG AG AGG-3 ' IS ' -ACTACTCGAG ACAGTACTTGG 
AGGGCTTC-3 ' ) including restriction sites Notl andXhol. Am- 
plified products were separated on a 0.8% agarose gel, and frag- 
ments were excised, gel extracted, and cloned into pcDNA3. 
Positive clones were sequenced using forward primer T7 and 
reverse primer Sp6. 

The sequence of avian teneurin-3 including alternatively 
spliced variants was assembled from overlapping fragments 
cloned by polymerase chain reaction (PCR). cDNA was pre- 
pared from total RNA extracted from embryonic day 16 
chicken cerebellum using the RNeasy Mini kit (Qiagen, Ger- 
man town, MD). PCR was performed with the Platinum Pfx 
DNA Polymerase System (Invitrogen). Five sets of primers 
were used to divide avian teneurin-3 into five segments. 
Segment 1 used primer pair 5 ' - ATGG ATGTG AAAG AAC 
GTCG-3 ' IS ' -CACGTGGAGGGTAAACGATAA-3 '; segment 
2 used primer pair 5 ' -ACTGTGAAGAAGCGG ATTGC-3 75 ' - 
G ACCGCCAAAAGTCACTAG A-3 ' ; segment 3 used primer 
pair 5 ' -TGATGGGACCATGATCAG AA-3 ' IS ' -ACCAGACGG 
CAG ACATG AAC-3 ' ; segment 4 used primer pair 5'-AGC- 
G AGGGACGACTAGTG AA-3 ' IS ' -GG AG AAAGG ATAG AGT 
GAAA-3'; and segment 5 used primer pair 5'-AGGCTGTG- 
G ACAG AAGGAGA-3 ' IS ' -GGTCCTCTACTTGG ATGACT-3 ' . 
Each segment was TOPO cloned into the pCR-ll vector (In- 
vitrogen) for sequencing. 

Results 

Overview: Identification of Teneurins and Analysis 
of Teneurins from Homo sapiens 
Genes encoding teneurin-like proteins and predicted proteins 
with the characteristic domain organization of known ten- 
eurins were identified by sequence similarity (e.g., tBLASTn) 
and by the presence of combinations of domains (e.g., pre- 
dicted proteins with both EGF repeats and NHL repeats using 
Pfam or SMART; for details, see Materials and Methods). Ten- 
eurins identified in this way from chordates are summarized 
in supplementary table S1, Supplementary Material online; 
teneurins from nonchordates are summarized in supplemen- 
tary table S2, Supplementary Material online. 

To illustrate the features of these teneurins identified 
through proteomic analysis, the four teneurins from 
H. sapiens are shown in figure 1. The variant of teneur- 
in-1 shown in figure 1 is a type II transmembrane protein 
with a 317aa N-terminal ICD, a 23aa transmembrane do- 
main, and a 2385aa C-terminal ECD. Within the ICD, there 
are two predicted importin ot/(3 pathway NLSs. The first 
NLS (aa11-40) has an NLS Mapper score of 4.8, and the sec- 
ond NLS (aa60-69) has an NLS Mapper score of 6.0. Higher 
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Fig. 1. The human teneurins. Human teneurins share a basic domain 
organization but they differ in the presence of NLSs, SH3-binding 
motifs, and furin cleavage sites. Following processing the ICD of 
teneurin-1, for example, is predicted to be located in the nucleus 
but not the ICD of teneurin-2. Similarly, the ECD of teneurins-2 and 
-3 are predicted to be shed into the ECM but not the ECD of 
teneurins-1 and -4. Accession numbers, the positions of domains as 
well as the NLS and ProP scores of each teneurin are found in 
supplementary table S3, Supplementary Material online. 

scores represent a greater likelihood of nuclear localization 
(Kosugi et al. 2009) and are indicated on the figure with 
progressively darker ovals. Two proline-rich SH3-binding 
domains (indicated by "PP" on fig. 1) are also found in 
the ICD of human teneurin-1. The first (aa193-199; 
RPLPPPP) is consistent with the consensus sequence for 
Class I SH3 ligands (+x(pPxcpP); the second (aa292-297; 
PRPLPR) is consistent with the atypical SH3-binding motif 
(PxxxPR) of cbl proteins (Kowanetz et al. 2003). In the ECD 
of human teneurin-1, there are eight EGF repeats (aa531- 
796). The third cysteine residue of the second EGF repeat 
has been replaced with a tyrosine residue, and the third 
cysteine residue of the fifth EGF repeat has been replaced 
with a phenylalanine (indicated by the "Y" and "F" in fig. 1). 
This substitution results in the potential for dimerization of 
teneurin monomers through disulfide bonds between cys- 
teines that lack an intramodular partner (Oohashi et al. 
1999). A cysteine-rich domain is found adjacent to the 
eighth EGF repeat (aa797-836). The carboxy two-thirds 
of human teneurin-1 shares the same domain organization 
as the YD-repeat proteins of some prokaryotes (e.g., A/lyx- 
ococcus xanthus, where it is required for gliding motility 
[Youderian and Hartzell 2007]): five NHL domains, YD- 
repeats (similar to "RHS repeats"), and a region near the 
C-terminal tail identified by Pfam as RHS protein (similar 
to "RHS-associated core domain"). The NHL domains of hu- 
man teneurin-1 were identified by Pfam (dark gray, fig. 1) or 
by alignment using ClustalW (light gray, fig. 1). Similarly, the 
RHS protein domain identified by Pfam in human teneurin-1 
is indicated in dark gray, and those identified in other ten- 
eurins by alignment are indicated by lighter shades. Finally, 



human teneurin-1 has a single predicted furin cleavage site 
with a ProP score at or above 0.55 (threshold = 0.50) at 
aa2618 (LNGRTRR/FA). This would create a 1 07aa C-terminal 
peptide similar to teneurin C-terminal-associated peptide-1 
(Trubiani et al. 2007). 

There are four teneurin genes in H. sapiens. The basic 
organization of teneurins-1, -2, -3, and -4 is the same, most 
notably in the ECD: each teneurin has eight EGF repeats 
with aromatic residues substituting for cysteines in the sec- 
ond and fifth repeat, each has a cysteine-rich domain and 
a C-terminal region similar to the YD-repeat proteins of 
prokaryotes, and each has a predicted furin cleavage site 
near the C-terminus (fig. 1; details can be found in supple- 
mentary table S3, Supplementary Material online). One dif- 
ference is the presence of a second predicted cleavage site 
between the transmembrane domain and EGF repeats of 
teneurins-2 and -3 that would permit shedding of the 
ECD into the ECM; these cleavage sites are not found in 
teneurins-1 and -4. Additional differences are seen in the 
ICD. Teneurins-1, -2, and -4 have proline-rich motifs that 
match consensus SH3-binding domains but teneurin-3 
does not. However, the proline-rich sequence PPTRPLPR 
is found in the ICD of human teneurin-3, which resembles, 
but does not exactly match, known SH3-binding motifs. 
Teneurin-2 does not have a predicted NLS, and the NLS 
of teneurin-3 has a lower NLS Mapper score than the NLSs 
of teneurin-1 and teneurin-4. 

Representative teneurins from major taxonomic groups 
were analyzed in this manner and are described below. 

Identification and Analysis of Chordate Teneurins 
The teneurins of mouse (A/1 t/s musculus), chicken (Gallusgal- 
lus), zebrafish (Danio rerio), and the protochordates Ciona 
intestinalis and Branchiostoma floridoe were identified 
and analyzed. There are four teneurins in A/1, musculus 
and G. gallus and they share many of the features described 
above for human teneurins. The few differences include the 
absence of a predicted NLS in murine teneurin-4, and the 
observation that chicken teneurin-2 has a predicted NLS, al- 
beit a weak one (fig. 2A and B; supplementary table S3, Sup- 
plementary Material online). In D. rerio, there are five 
teneurins. ClustalW alignment and basic phylogenomic anal- 
ysis identify two teneurin-2 paralogs that we have named 
teneurin-2A and teneurin-2B (figs. 2C and 3; supplementary 
table S3, Supplementary Material online). The predicted 
sequences of teneurin-1, teneurin-2B, and teneurin-4 appear 
to be complete, but the predicted N-termini of teneurin-2A 
and teneurin-3 were completed by translating potential 
open reading frames and aligning them with known ten- 
eurin sequences and by piecing together ESTs (supplemen- 
tary table S3, Supplementary Material online; FASTA files 
can be found in supplementary table S4, Supplementary 
Material online). As with the chicken teneurins, the basic 
features of the zebrafish teneurins are conserved. Differen- 
ces include the absence of a potential furin cleavage site 
near the C-terminus of teneurin-1, an additional potential 
furin cleavage site between the NHL repeats and YD re- 
peats of teneurin-2A and the absence of predicted NLSs 
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Fig. 2. The domain organization and possible relationships of chordate teneurins. There are four teneurins in mouse (Mus musculus) and 
chicken (Gallus gallus). Basic features are highly conserved between mouse (A), chicken (B), and human teneurins (fig. 1). There are five 
teneurin genes in the zebrafish, Danio rerio, including two teneurin-2 paralogs. The ICD of the predicted teneurin-1 does not have an NLS, and 
an additional furin cleavage site is found between the NHL domains and the YD repeats (C). The genomes of the tunicate Ciona intestinalis and 
the cephalochordate Branchiostoma floridae each contain a single teneurin gene (D). 



in the ICDs of teneurin-1 and teneurin-2A. In addition, the 
ICD of D. rerio teneurin-3 has a predicted proline-rich SH3- 
binding domain, unlike the ICDs of teneurin-3 in chicken 
and man. 

To determine if the duplication of teneurin-2 is a com- 
mon feature in bony fish, the teneurins of the stickleback 
Gasterosteus aculeatus were also identified (supplementary 
table S1, Supplementary Material online) and aligned with 
the teneurins of other chordates. Like D. rerio, C. aculeatus 
has five teneurin genes. However, there are two teneurin-3 
paralogs (teneurin-3A and teneurin-3B) and only one ten- 
eurin-2 (fig. 3). The teneurin-1 of C. aculeatus has a poten- 
tial furin cleavage site near the C-terminus (aa2642; ProP 
score = 0.74), indicating that the absence of this site in 
D. rerio may not be typical of teneurin-1 in actinoptery- 
gians. The C. aculeatus teneurin-1 also has a predicted 
NLS in the ICD (aa11-41, NLS Mapper score = 4.0). 

The genomes of C intestinalis and B. floridae each en- 
code a single teneurin (fig. 2D, supplementary table S3, 
Supplementary Material online). The basic organization 
of these teneurins resembles those of the craniate teneur- 
ins. The ICD of the predicted teneurin from C intestinalis 
has two NLSs: one near the N-terminus and the other near 
the transmembrane domain. A predicted NLS near the 
transmembrane domain is commonly observed in the 
ICD of teneurins from protostomes (see below). The ICDs 
from both of the protochordate teneurins have predicted 
SH3-binding domains, but the ICD from B. floridae does not 
have an NLS that is recognized by NLS Mapper. Both of the 
predicted protochordate teneurins have potential furin 
cleavage sites that would shed the ECD and process the 
C-terminus like those of teneurin-2 and teneurin-3 in fish, 



birds, and man, but like D. rerio teneurin-2A (and unlike 
other craniate teneurins examined) they also have a third 
predicted furin cleavage site near the center of the ECD. A 
second teneurin-like sequence is found in the B. floridae 
genome when two adjacent predicted proteins (XP_0025 
92160 and XP_002592161) are combined. However, the C- 
terminal two-thirds of the second predicted protein also 



Gasterosteus Ten-3B 




Fig. 3. An unrooted phylogenetic tree based on ClustalW alignment 
of real and predicted teneurin amino acid sequences suggests that 
teneurins-1 and -4 share a common ancestor, as do teneurins-2 and 
-3. The stickleback Gasterosteus aculeatus has five teneurins, but 
unlike Danio rerio it has retained two teneurin-3 paralogs. 



1022 



Teneurin Evolution • doi:10.1093/molbev/msr271 



MBE 



chrX:123,838,927-124,097,602 
H. sapiens Teneurin-1 ICD 

1 216 217 477 4 78 53 4 

m 1 2 i — m — 



535 774 775 942 



-m m 



chr4:15,414,755-15,510,910 
G. gallus Teneurin-1 ICD 

1 216 217 477 

m cz> 



478 717 718 894 

-m m 



NLSNLS pp pp 

I 1 I 2 I 3 I 4 | 5 I HE578281 



chr5:166,711,843-167,420,123 
H. sapiens Teneurin-2 ICD 

1 225 226 501 

m rx~ 



NLSNLS pp pp 

I 1 I 2 |3 |4 I AJ238613 



709 945 946 1113 

4 | 1 5 




B 



1 1 2 


3 1 4 1 5 






1 1 2 


4 15 1 


PP 


?: 2b 


3 1 4 1 5 




, PP , 


? : 2B 


4 | 5 | 



chr4:183,245,174-183,549,978 
H. sapiens Teneurin-3 ICD 

1 231 232 510 

m In- 



variant 1 HE578282 
Variant 2 HE578283 
AK056053 
BX648178 

511 747 748 915 



-m — m 



1 1 


2 1 3 Ml 


NLS 




1 1 


3 1 4 | 



Variant 2 HE578285 



chrll:78,600,894-78,780,989 
H. sapiens Teneurin-4 ICD 

1 222 223 492 

L3>OT 



1 1 1 2 


4 


5 1 6 | 


NLS 




1 1 1 2 


4 





74 7 748 849 850 1011 

"El- 



Variant 2 HE578287 



D 



> 3| 4 1 5 1 6^ BU072782 



chrl3:4,654,066-5,137,533 
G. gallus Teneurin-2 ICD 

1 225 226 501 

m C 





— cd — m- 


1 5 h 


NLS 


pp 




| 1 | 2 


31 4 | 5 | 6 | 


AJ279031 


NLS 


1 1-^ 




1 1 1 2 


4 | 5 | 6 | 


AJ245711 



chr4:41,575,777-41,774,119 
G. gallus Teneurin-3 ICD 



m m- 



5 11 74 7 748 924 

-m — m 



1 


2 j 3 |4| 


Variant 1 HE578904 


NLS 






i 


3 |4| 


Variant 2 HE578905 



chrl:196,981,052-197,202,627 
G. gallus Teneurin-4 ICD 

1 222 223 49 2 

□ZHZHT" 



H 



NLS v 


1 1 1 


2 


4* 



7 03 954 955 1054 1055 122 8 

4 I [1] — | 6 



BX932994 



Fig. 4. Comparison of alternative splicing in the ICD of human and chicken teneurins. The human teneurin-1 ICD is encoded on five exons. No 
splice variants were found using RT-PCR and primers corresponding to sequences in exon 1 and exon 5 (A). In contrast, RT-PCR reveals two 
variants amplified with primers based on sequences in the first and fifth exon of human teneurin-2. Variant 1 is encoded on 5 exons, whereas 
Variant 2 is composed of four. ESTs suggest an additional way to generate diversity in the ICD of human teneurin-2: An exon found between 
the second and third exon (exon 2B) is associated with long and short variants as well. This may represent an alternative start site for teneurin- 
2 (B). Human teneurin-3 has an ICD encoded on four exons, and exon 2 can be spliced out to generate a second variant (C). RT-PCR with 
primers based on sequences in exon 1 and exon 6 reveal long and short variants of human teneurin-4. Variant 1 is encoded on five exons, 
whereas Variant 2 is encoded on four. Use of an additional exon (exon 3) that does not contain a start codon is found in an EST (D). The ICDs 
of teneurins-1 and -2 from the chicken were described previously. As in human, there is a single teneurin-1 isoform, and there are long and 
short variants of teneurin-2. There is no evidence from chicken of an alternative start site in teneurin-2 (E, F). PCR using cDNA from embryonic 
chicken cerebellum and primers based on sequences in exons 1 and 4 reveal identical variants in chicken and human teneurin-3 (C). Sequence 
from a single EST from the chicken is consistent with the organization of the ICD of human teneurin-4 (H). 



has lysozyme and keratin-related sequences, and some ten- 
eurin and lysozyme-like sequences are encoded on the 
same large predicted exon, which leads us to suggest that 
this is a pseudogene. This is supported by the total absence 
of ESTs. 

Alternatively Spliced Variants 

Previously we showed that there are a number of isoforms 
of avian teneurin-2 and that these variants are derived from 
alternative splicing of regions of transcripts encoding both 
the ICD and the ECD (Tucker et al. 2001). Here, we exam- 
ined ESTs and used RT-PCR to determine if the ICD variants 
are specific for teneurin-2 and if they are conserved in 
mammals and birds. A single PCR product is amplified from 
adult human brain-derived cDNA using primers corre- 
sponding to the 3' end of the first exon of the human 
teneurin-1 gene and the 5' end of the fifth exon, which 
encodes the transmembrane domain (fig. 4A). When the 
same strategy is applied using primers based on human 
teneurin-2 sequences, two variants are observed: Variant 



1 is encoded by all five previously identified exons and Var- 
iant 2 lacks the third exon (fig. 4B). These variants are sup- 
ported by EST data, which also reveal the use of a sixth exon 
found between exon 2 and exon 3. EST sequences contain- 
ing this alternative exon, which we have named exon 2B, do 
not contain sequences corresponding to either exon 1 or 
exon 2. As a putative start codon is found in exon 2B, this 
exon may be used as an alternative start site for teneurin-2 
transcripts (and therefore would not have been amplified 
using our flanking primer pairs). The ICD of teneurin-3 is 
encoded on four exons and like teneurin-2 there are two 
ICD splicing variants: Variant 1 uses all four exons, whereas 
Variant 2 is encoded on exons 1, 3, and 4 (fig. 4C). Finally, 
RT-PCR reveals two alternatively spliced variants of the hu- 
man teneurin-4 ICD. The larger is encoded by five exons, 
and a smaller by four exons. Interestingly, an EST (BU72782) 
shows the potential use of an additional exon that was not 
amplified by our primer pair (fig. 4D). 

ESTs demonstrate that some of the ICD variants we ob- 
served in human teneurin-1 and teneurin-4 are conserved 
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in C. gallus, just as our previous work with avian teneurin-2 
showed the presence of two ICD variants (fig. 4E, F, and H). 
To study the alternative splicing of teneurin-3, we used RT- 
PCR to amplify products corresponding to the ICD and 
cDNA derived from embryonic chicken brain. Just as in hu- 
man, the avian teneurin-3 ICD is encoded on four exons. A 
large variant contains sequences corresponding to all four 
exons, and exon 2 is spliced from a smaller variant (fig. 4G). 
Note that we could not identify an exon homologous to 
exon 2B of human teneurin-2 in the chicken genomic se- 
quence, but there is a homologous potential exon 3 in 
chicken teneurin-4 DNA. 

There is also evidence of teneurin variants derived from 
alternative splicing in the region encoding the ECD. For ex- 
ample, a short (8aa) stretch of amino acids can be present 
between the seventh and eighth EGF repeats of teneurin-2 
from the chicken, and at least one variant of avian teneurin- 

2 is truncated between the seventh and eighth EGF repeats, 
resulting in an isoform lacking the cysteine-rich domain 
and the region homologous to the YD repeat proteins 
of prokaryotes (Tucker et al. 2001). Alternative splicing that 
results in additional sequence between the seventh and 
eighth EGF repeats may be common in teneurins, as similar 
variants are found in mRNA sequences in mouse teneurin- 

3 (NP_035987) and teneurin-4 (BAE28005). However, there 
is no evidence supporting grossly truncated isoforms of 
teneurins in other species. 

Identification and Analysis of Teneurins from an 
Echinoderm and Protostomes 

The same methods used to identify teneurins in chordates 
were applied to other metazoan sequences. In addition to 
the known teneurins of D. melanogaster and C elegans, pre- 
dicted complete or partial teneurins were found in the pur- 
ple sea urchin (Strongylocentrotus purpuratus), a mollusk 
(Lottia gigantean), an annelid (Capitella teleta), a trematode 
(Schistosoma mansoni), and a wide variety of nematodes 
and arthropods (supplementary table S2, Supplementary 
Material online). Interestingly, no teneurin-like sequences 
were identified in the genomes or ESTs from cnidarians, 
ctenophores, sponges, or Trichoplax adhaerens. 

The single teneurin from S. purpumtus is remarkable for 
a few features not seen in teneurins from chordates: 1) it has 
only six EGF repeats and only the second EGF repeat has an 
aromatic residue substituting for a cysteine residue; 2) it 
lacks a predicted furin cleavage site near the C-terminus; 
and 3) it has two predicted furin cleavage sites between 
the transmembrane domain and the EGF repeats (fig. 5A). 
Like the teneurins from protochordates, it has a predicted 
furin cleavage site near the center of the ECD. 

The teneurins from C elegans and D. melanogaster are 
well known, and the sequences that were analyzed here 
came from cDNAs. There are two teneurins from C elegans, 
Ten-1 L and Ten-1 S; they are encoded by the same gene, but 
two different promoters regulate the expression of "long" 
and "short" transcripts (Drabikowski et al. 2005). Ten-1 L is 
illustrated in figure 4A; Ten-1 S would be identical except 
the ICD is much smaller (the approximate location of 
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Fig. 5. The domain organization of nonchordate teneurins. The 
ECDs of teneurins from the purple sea urchin (Strongylocentrotus 
purpumtus), Caenorhabditis elegans, Drosophila melanogaster, and 
the trematode Schistosoma mansoni have the same basic 
organization as chordate teneurins, but there is variation in the 
number of EGF repeats. Aromatic residues substitute for cysteines in 
at least one EGF repeats in all the species examined except the fluke, 
which suggests that the teneurin from S. mansoni does not dimerize 
(A). The genome of the choanoflagellate Monosiga brevicollis 
encodes a protein with a domain organization that is identical to 
metazoan teneurins. The predicted protein does not have furin 
cleavage sites or SH3-binding domains, and its EGF repeats contain 
a full complement of cysteines. The predicted A/1, brevicollis teneurin 
is encoded on just four exons, and most of the ECD is encoded on 
a single mega-exon of 6829 bp (B). 

the N-terminus of Ten-1 S is indicated by the crooked arrow 
in fig. 5A). Ten-1 L and the two D. melanogaster teneurins, 
Ten-m and Ten-a, have putative SH3-binding domains and 
one or more NLS. Unlike the NLSs of most chordate ten- 
eurins, the NLSs from the ecdysozoans tend to be found 
near the transmembrane domain and not at the N-terminus. 
The ECDs of these teneurins are similar to those found in 
chordate teneurins: note that both Ten-m and Ten-a (but 
not Ten-1 L) have potential furin cleavage sites near the C- 
terminus, and both Ten-1 L and Ten-a have potential furin 
cleavage sites that could shed the ECD into the ECAA. The 
fifth EGF repeat of C elegans Ten-1 L is incomplete; the part 
of this repeat encoding both the second and third cysteine 
residues in other teneurins is missing. Also, the part of the 
ECD near the C-terminus that is predicted by Pfam to be ho- 
mologous to the RHS core-associated protein domain is more 
unlike this domain in other teneurins, though some identity 
could be found by alignment. 

Two teneurin genes that encode predicted proteins that 
align with either D. melanogaster' s Ten-a or Ten-m were 
identified in the genomes of a number of insect species, 
including Apis mellifera (honey bee), Tribolium castaneum 
(flour beetle), and the mosquitoes Aedes aegypti and Culex 
quinquefasciatus (supplementary table S2, Supplementary 
Material online). However, single teneurin genes were 
found in the genomes of the branchiopod crustacean 
Daphnia pulex and the arachnid Ixodes scapularis (deer 
tick). This suggests that the duplication of teneurins in 
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Table 1. Alignment of Representative Sequences with the Cysteine-Rich Domain Core Sequence of A/1 onosiga brevicollis Teneurin. 


Species (common name) 


Core Sequence 


% Id 


% Sim 


Monosiga brevicollis (choanoflagellate) 
Teneurin 


CNDGIDNDNDRVTDCNDADCC S S 






Phaeodactylum tricornutum (diatom) 
Endo-1,3-beta-glucosidase 


CNDGIDNDNDGLFDCEDPDCAND 
********** . **.* ** 


65% 


91% 


Schistosoma mansoni (fluke) 
Teneurin 


CDDGIDNDHDDLVDCLDPDCCTS 
*.******.* . ** * ***.* 


65% 


91% 


Caenorhabditis elegans (roundworm) 
Ten-a 


CDDGLDNDSDGLIDCDDPECCSS 
*.* **** * .*** *..*** 


61% 


91% 


Homo sapiens (human) 
Teneurin-1 


CGDNLDNDGDGLTDCVDPDCCQQ 
* * . * * * * .*** *^*** a> 


57% 


91% 


Volvox carteri (volvox) 
Gametolysin 


CDDGIDNDCDGLVDMDDPDCNTS 
*.****** * . ^ * . * ^ * * .* 


57% 


83% 


Strongylocentrotus purpuratus (purple sea urchin) 
Teneurin 


CTDEVDNDGDSLIDCEDPDCCLS 
*^* . * * * ^ * . * * . * ^ * * * * 


57% 


83% 


Pfam: Cu-binding_MopE 
PF11617* 


C . DGVDNNCDGQVD 
* **.**. * m * 


54% 


77% 



Note. — *The Cu ++ -binding consensus domain of MopE and related proteins. 



the protostome lineage is limited to insects and is not a fea- 
ture of all arthropods. 

The trematode S. mansoni has a single predicted teneurin 
(fig. 5A; supplementary table S3, Supplementary Material on- 
line) that has a number of distinctive features. Its ICD is rel- 
atively short, and it does not contain predicted SH3-binding 
motifs or an NLS. Like many teneurins it has a putative furin 
cleavage site between the transmembrane domain and the 
EGF repeats, but it also has a second predicted furin cleavage 
site amidst the YD repeats. Finally, the fluke teneurin has 
only four EGF repeats, and all four EGF repeats have a full 
complement of cysteine residues, so unlike other teneurins 
studied to date it probably fails to dimerize. 

A Teneurin Is Encoded in the Genome of the 
Choanoflagellate /VI. brevicollis 

The absence of teneurin genes in the complete and assem- 
bled genomes of the cnidarian Nematostella vectensis and 
placozoan T. adhaerens (that could be identified using 
the search methods employed to find teneurins in other 
metazoans) initially suggested that teneurins may have 
evolved about the time of the Cambrian radiation. However, 
during a routine search of predicted protein domain archi- 
tectures that included RHS core-associated protein domains 
using the Pfam program, a sequence encoding EGF repeats, 
NHL domains, and YD repeats (in addition to the RHS core- 
associated protein domain) was identified in the genome of 
the choanoflagellate Monosiga brevicollis. Further analysis 
revealed that this predicted protein has the basic features 
of a metazoan teneurin: it is a type II transmembrane protein 
with a putative NLS in the ICD, eight EGF repeats, a cysteine- 
rich domain, and a C-terminal two-thirds with the same do- 
main architecture as a prokaryotic YD-repeat protein (i.e., 
NHL domains, YD repeats, and an RHS core-associated protein 



domain; fig. 5B; supplementary table S3, Supplementary 
Material online). The predicted sequence (XP_001749414) 
is shown in its entirety in supplementary figure S1, Supple- 
mentary Material online, together with relevant alignments 
generated with ClustalW. The expression of the A/I. brevi- 
collis teneurin is supported by two nonoverlapping ESTs 
(FE890769 and FE895158), both of which correspond to re- 
gions encoding the YD repeats. 

The ICD of the A/I. brevicollis teneurin does not align sig- 
nificantly with the ICDs from other teneurins, and it lacks 
SH3-binding motifs. In addition, ProP fails to identify any po- 
tential furin cleavage sites in this teneurin. There are eight EGF 
repeats, but there are no free cysteines to support dimeriza- 
tion. Adjacent to the EGF repeats is a cysteine-rich region that 
is highly conserved: the exact 23aa consensus sequence 
ExxCx(D/N)xxDx(D/E)xDxxxDCxxx(D/E)CCxxxxCxxxxxC is 
found in all the teneurins analyzed except for S. purpuratus 
teneurin (which has one additional "x" between the fourth 
and fifth cysteine) and S. mansoni teneurin (which is missing 
the sixth cysteine). In fact, using the A/I. brevicollis cysteine-rich 
domain sequence in a tBLASTn search of all nucleotide se- 
quences uncovers all the teneurins identified above that 
are listed on GenBank. However, neither this method nor 
the other search methods we employed to identify teneurin 
sequences revealed teneurins in sequences from sponges, pla- 
cozoans, ctenophores, cnidarians, fungi, ichthyospores or nu- 
cleariids. Interestingly, a similar cysteine-rich sequence is 
found in an endo-1,3-beta-glucosidase from the diatom 
Phaeodactylum tricornutum (XP_002181321). This sequence 
is 46% identical and 71% similar to the 35aa cysteine-rich do- 
main of A/I. brevicollis and it includes a core stretch of 23aa 
that is 65% identical and 91% similar (table 1). This 23aa core 
domain aligns well with the Cu ++ -binding motif of MopE 
(Helland et al. 2008), and similar sequences are found in 
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Table 2. Alignment of Most Similar Sequences with YD Repeats from Monosiga brevicollis Teneurin 3 



M. brevicollis 

Syntrophobacter fumaroxidans b 



YD VDGQLTQ VLE DGAE VE S Y S YD VNGNRVAWN VRG AAH S AT YGADD A 

YDSLGRLLAVRLDGVPAEE YRYDVNGNRVEETNTPRGIIGRTSTYSEEDH 
** *.* * ** * * ******** ...** .* 



M. brevicollis 
S. fumaroxidans 



VFTVDGQSYAVDVDGFLTSVRG M S LAY S GRGE LL S ATLP S GAGT VR 

LLTSGGTVYRYDADGFLTTRTEGSAVTRYVYSSRGELLSVALPDGK-RIE 
..* * * * *****. ** ****** .** * 



M. brevicollis 
S. fumaroxidans 



YRYDGFGRRI 

YVNDPLGRRI 46% identical/69% similar 



M. brevicollis 
Desulfococcus oleovorans c 



YD VDGQLTQ VLE DGAE VE S Y S YD VNGN RVAWNVRGAAH S AT YGAD 

YDEMGRLETVTKDGTLVE SYSYDSTPYGTCTYQMNTLRGIAGRVLDYDAE 



** *.* * .**. ******* 



M. brevicollis 
D. oleovorans 



D AVF T VDGQ S Y AVD VDGFLT S VRG M S LAY S GRGE LL S ATLP S GAGT 

DHLLSAGGTDYQYDLDGFLTSKTSGAETTYYDYSSRGELLSVDLPDGT-D 



* * *.****** 



** ****** ** *. 



M. brevicollis 
D. oleovorans 



VRYRYDGFGRRI 

ITYVHDPLGRRI 43% identical/67% similar 



* . * . * * * * 



M. brevicollis 
Homo sapiens Ten-4 



YD VDGQLTQ VLE DGAE VE S Y S YD VNGNRVAWN VRGAAH S — AT YGADD AV 

YDADGQLQTVSINDKPLWRYSYDLNGNLHLLSPGNSARLTPLRYDIRDRI 
** *****. .****.*** .*.* *. 



M. brevicollis 
H. sapiens Ten -4 



FTVDGQSYAVDVDGFLTSVRGM SLAYS GRGE LLS ATLP SGAGTVRYRYDG 

TRLGDVQYKMDEDGFLRQRGGDIFEYNSAGLLIKAYNRAGSWSVRYRYDG 
* .***** * . * **. * .*. .******* 



M. brevicollis 



FGRRI 
LGRRV 



38% identical/63% similar 



a As determined by tBLASTn with aa2276-aa2379 of A/I. brevicollis teneurin against the NCBI Nucleotide Collection database and aligned with ClustalW 2.1. 
b YD protein from genomic sequence CP000478 (4941818-4942148). 
c YD protein from genomic sequence CP000859 (1374170-1373806). 



the metal-binding region of a predicted gametolysin from Vol- 
vox carter] (table 1 [XP_002958497]) and Chlamydomonas 
reinhardtii (XP_001 695639). Alignment of the 23aa motif re- 
veals that the core of the cysteine-rich region of A/1, brevicollis 
teneurin is most similar to the diatom sequence and the core 
of the cysteine-rich region of trematode teneurin; the same 
regions from other metazoan teneurins are conserved but 
not to the same extent (table 1). 

The NHL repeats and YD repeats of A/1, brevicollis align bet- 
ter with the YD-repeat proteins of some prokaryotes than 
with the YD repeats found in metazoan teneurins. The 
NHL domains are most similar (31% identical) to the YD- 
repeat protein of Herpetosiphon aurantiacus (ABX04679), 
a predatory filamentous chloroflex bacterium that lives in soil 
and freshwater. A stretch of 103aa corresponding to YD re- 



peats 17-21 of A/1, brevicollus teneurin was analyzed further 
using tBLASTn and the entire NCBI nucleotide collection. This 
stretch is most similar to the YD-repeat proteins of Syntropho- 
bacter fumaroxidans (a freshwater bacterium) and Desulfococ- 
cus oleovorans (which lives in coastal waters) and then to the 
YD repeats found in human teneurin-4 (table 2). 

The A/1, brevicollis teneurin is predicted from sequences 
encoded on just four exons that are separated by introns of 
129, 206 and 105 bp (in contrast, human teneurin-1 is en- 
coded on 29 exons and the average intron is 8 kb). Remark- 
ably, the region of the predicted protein corresponding to 
the four C-terminal EGF repeats, the cysteine-rich domain, 
the NHL repeats, the YD repeats, and the RHS core- 
associated protein domain is encoded on a single giant 
exon of 6829 bp (fig. SB). For comparison, the 
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corresponding regions of human teneurin-1, S. mansoni 
teneurin, and C. elegans Ten-1L are encoded on 20, 21, 
and 7 exons, respectively (see also Minet and Chiquet- 
Ehrismann 2000). 

Discussion 

Here, we have used predictions based on proteomics to de- 
termine which teneurins may be processed such that the 
ECD becomes incorporated into the ECAA and which ten- 
eurins may be processed such that the ICD is transported 
to the nucleus. Our predictions are validated by our pre- 
vious experimental studies with avian teneurins. For exam- 
ple, we showed that a recombinant fusion protein with the 
ECD of chicken teneurin-2 was cleaved in vitro at a furin 
site between the transmembrane domain and the EGF re- 
peats (Rubin et al. 1999). Consistent with this observation, 
we also showed that antibodies to the ECD of chicken 
teneurin-2 not only labeled the cell surface but also the 
ECAA of chicken embryos (Tucker et al. 2001). When tagged 
chicken teneurin-2 ICD is overexpressed in HT1080 cells the 
recombinant ICD localizes to the nucleus (Bagutti et al. 
2003), but there is no evidence published to date that 
the teneurin-2 ICD is processed and transported to the cell 
nucleus in vivo. In contrast, antibodies to the ECD of 
chicken teneurin-1 failed to stain the ECAA, but antibodies 
to the ICD of teneurin-1 routinely stained the nuclei of cells 
in vitro and in histological sections of embryos (Kenzelmann 
et al. 2008). Moreover, when the sequence RKRK in the avian 
teneurin-1 ICD is mutated to AAAA it no longer localizes 
to the nucleus in vitro (Kenzelmann et al. 2008). Here, 
we show that the ICD of chicken teneurin-1 (specifically, 
the RKRK and flanking sequences) is predicted with a high 
likelihood to be located in the nucleus (NLS AAapper score = 
9.0) and that the ICD of chicken teneurin-2 is much less likely 
to be nuclear (NLS AAapper score = 2.7). Similarly, the 
chicken teneurin-2 furin -cleavage site that we previously 
demonstrated to be functional is predicted by ProP to 
be active (score = 0.65), but no such site is found in 
chicken teneurin-1. Thus, teneurin-1 and/or teneurin-4 
are most likely to be processed (by a yet unknown mech- 
anism) so that the ICD can move to the nucleus, and 
teneurin-2 and/or teneurin-3 are more likely to have the 
ECD shed into the ECAA. The shared features of these pairs 
of teneurins are also reflected in their predicted origins: 
teneurins-1 and -4 appear to have evolved from a gene 
duplication, as do teneurins-2 and -3 (fig. 3; see also AAinet 
and Chiquet-Ehrismann 2000). All the chordate teneurins 
examined here except D. rerio teneurin-1 are likely to be 
cleaved near the C-terminus. This may be a step that pre- 
cedes the formation of the teneurin-derived C-terminal 
neuropeptides characterized by others (reviewed by 
Rotzinger et al. 2010). 

Using a yeast two-hybrid screen, Nunes et al. (2005) found 
that the SH3 domains of CAP/ponsin interact with the sec- 
ond proline-rich SH3-binding motif of chicken teneurin-1; 
the identical motif is present in teneurin-4. CAP/ponsin 
in turn binds to vinculin, which could anchor the ICD of 



teneurins to the actin cytoskeleton. A predicted SH3-binding 
motif at the same location in teneurin-2 does not bind CAP/ 
ponsin even though it varies from the motif in teneurins-1 
and -4 by only a single amino acid (Nunes et al. 2005). This 
led us to analyze teneurins from a broad range of taxa for 
SH3-binding motifs. The teneurin ICDs from each species 
examined, except for S. mansoni and A/I. brevicollis, contained 
one or more consensus SH3-binding motif. Interestingly, 
S. mansoni and A/I. brevicollis are the only species examined 
with teneurins lacking the capacity to dimerize. Perhaps di- 
merization is necessary for the ICD-interacting proteins to 
link teneurins to the cytoskeleton or to regulate the pro- 
cesses necessary for ICD nuclear localization. 

Databases (e.g., GenBank, Ensembl, JGI, UniProt) contain 
listings for numerous teneurin variants. AAost of these var- 
iants are based on predicted sequences, but some are based 
on cDNAs and ESTs. Here, we chose to study the range of 
alternative splicing in the ICD of human and chicken teneur- 
in by PCR. The ICDs of human and chicken teneurins tend to 
be encoded on two pairs of neighboring exons separated by 
a large intron. Additional exons, which often are not con- 
served between birds and man and which frequently are sub- 
jected to alternative splicing, are sometimes found between 
the two pairs of exons. Alternatively spliced exons do not 
contain recognizable SH3-binding domains or NLSs, so 
the significance of these variations is not clear. Interestingly, 
an alternatively spliced exon in human teneurin-2 may rep- 
resent an alternative start site, as ESTs with this sequence do 
not contain sequence from exons 1 or 2, and sequences en- 
coded on this exon are not found in the PCR products am- 
plified using a primer based on sequences found in exon 1. 
A similar method for generating teneurin splice variants was 
shown previously for C elegans (Drabikowski et al. 2005). 

The extraordinary diversity of teleost fish is commonly 
attributed to the duplication of their genome followed by 
the selective retention of certain duplicated genes (see 
Jozefowicz et al. 2003; Postlethwait et al. 2004; Volff 
2005). This has been supported by studies of Hox genes 
(Kurosawa et al. 2006; Zou et al. 2007). In contrast, com- 
parisons of Sox genes in the zebrafish D. rerio and the stick- 
leback C. aculeatus (Cresko et al. 2003) show the mutual 
retention of Sox9a and Sox9b, albeit with subtle differences 
in their patterns of expression. Here, we show that the 
zebrafish has retained genes encoding teneurin-2A and 
teneurin-2B, whereas the stickleback has retained genes 
encoding teneurin-3A and teneurin-3B. Current models 
of selective gene retention in teleosts predict that genes 
are preserved following degeneration of regulatory ele- 
ments and the partitioning of function between the dupli- 
cated gene products (Force et al. 1999). It is likely that 
a large, multifunctional protein like a teneurin would be 
selected in this way, and differential retention and expres- 
sion could contribute to speciation. 

Previously we speculated that the RHS proteins of bac- 
teria, which share significant sequence homologies with the 
C-terminal portion of teneurins, may have evolved from 
horizontal gene transfer from a metazoan teneurin to a sym- 
biotic or pathogenic prokaryote (AAinet and Chiquet- 
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Ehrismann 2000). However, the presence of a teneurin gene 
in A/1, brevicollis, but not in the other nonmetazoan opistho- 
konts (e.g., fungi), suggests that the gene evolved in a choa- 
noflagellate, and the horizontal gene transfer was from 
a prokaryote to a eukaryote instead of the other way around. 
Horizontal gene transfer between predatory A/l. brevicollis 
and their prokaryotic prey has been described previously. 
For example, Foerstner et al. (2008) reported that a nitrile 
hydratase is encoded in the A/l. brevicollis genome that is 
most closely related to enzymes from proteobacteria; the 
absence of this enzyme from other eukaryotic genomes 
strongly implies horizontal gene transfer from prokaryotic 
prey to eukaryotic predator. Over a hundred genes originat- 
ing from haptophytes and diatoms have also been found in 
A/l. brevicollis (Nedelcu et al. 2008; Sun et al. 2010), indicating 
that gene transfer may be a relatively common occurrence in 
these organisms. In fact, this may explain the origin of the 
highly conserved cysteine-rich domain, which is nearly iden- 
tical to part of an enzyme from the diatom P. tricornutum 
(and is also similar to an enzyme in V. carteri) but is only 
found in teneurins in metazoa. If this is the case, A/l. brevicollis 
teneurin originated as a fusion protein acquired by horizon- 
tal gene transfer from both a prokaryote and a diatom or 
algae. 

Choanoflagellates are believed to be the closest living rel- 
atives of metazoansn (King and Carroll 2001; Philippe et al. 
2004; King et al. 2008). The presence of teneurins (which 
have been shown to play roles in cell-cell and cell-ECM in- 
teractions in a variety of tissues) on the surface of an ances- 
tral choanoflagellate may have facilitated the evolution of 
metazoan multicellularity and the development of complex 
tissues. Similar roles have been proposed for cadherins, 
which appear to have evolved in a choanoflagellate as well 
(Abedin and King 2008). The two cadherins of A/1, brevicollis 
are found in the microvilli that form the feeding collar that 
surrounds the base of the flagellum, which has led to the 
hypothesis that this family of proteins, which is indispens- 
able in the formation of meaningful cell-cell contacts in an- 
imal tissues, evolved as a means of catching prey. Teneurins 
may have evolved to do something similar. YD-repeat pro- 
teins are found on the surface of aquatic bacteria, and in 
vitro studies with eukaryotic cells show that teneurin ex- 
pression leads to increased cell-cell adhesion (Rubin et al. 
2002). The acquisition of the carbohydrate-rich YD-repeat 
proteins from a prokaryote by a choanoflagellate may have 
improved "fishing" for bacterial prey in the feeding collar. It 
will be interesting to determine where A/l. brevicollis teneur- 
in is expressed to test this hypothesis. 

The lowest branches of the metazoan tree of life include 
the ancestors of sponges, ctenophores, and cnidarians. 
Therefore, it is puzzling that we were able to identify ten- 
eurins in a choanoflagellate and in all the available genomes 
of Bilateria (i.e., deuterostomes and protostomes) but not 
in modern sponges or cnidarians. It is possible that our 
search methods were insufficient to find them. More likely 
they are present in some of these organisms but not in the 
organisms with complete and well-assembled genomes like N. 
vectensis. It will be important to scrutinize newly sequenced 



and assembled sponge and cnidarian genomes for teneurin 
genes as they become available. Another possibility is that ten- 
eurins evolved in a relatively advanced common ancestor of 
protostomes and deuterostomes, after the evolution of 
sponges and cnidarians. In this scenario, the teneurin in 
A/l. brevicollis would have been acquired by horizontal gene 
transfer from metazoan-derived detritus and not a prokaryote. 
Evidence against this hypothesis includes the relative similarity 
of the core region of the cysteine-rich domain and a diatom 
enzyme and the YD repeats to YD-repeat proteins from bac- 
teria, as well as the organization of the A/l. brevicollis teneurin 
gene: most of the ECD is encoded on a single huge exon, not 
unlike the YD-repeat proteins of prokaryotes, and unlike the 
ECD of metazoan teneurins. Others (King et al. 2008) have 
reported that the number of introns per gene in A/l. brevi- 
collis is similar (6.6) to the number found in human genes 
(7.7), so the unusually large exon encoding the ECD of A/l. 
brevicollis teneurin argues for origins from a prokaryotic and 
not metazoan, horizontal gene transfer, and the subsequent 
loss of teneurins from the genomes of modern sponges and 
cnidarians. 

Teneurins are phylogenetically conserved among Bilateria, 
where they have been demonstrated to play critical roles in 
pattern formation, the organization of the ECAA, and the de- 
velopment of the nervous system. Genomic analysis reveals 
an ancient origin of teneurins in single-celled choanoflagel- 
lates that may have assembled teneurins via horizontal gene 
transfer from two of its prey: diatoms and prokaryotes. Thus, 
the talent for gene acquisition by an ancestral choanoflagel- 
late, perhaps to diversify its metabolic pathways and im- 
prove its ability to capture prey, may have contributed to 
the development of multicellularity in metazoans. 

Supplementary Material 

Supplementary tables S1-S4 and figure S1 are available at 
Molecular Biology and Evolution online (http:// 
www.mbe.oxfordjournals.org/). 
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